Fake Job Posting Detection

  • Tech Stack: Data Mining, SMOTE, Bernoulli Naive Bayes, Exact Bayes, Laplace Smoothing
  • Link: Project Link

Developed an advanced machine learning pipeline to detect fraudulent job postings using probabilistic classifiers and a dialer-based thresholding system. This project focused on identifying fake listings from over 500,000 job records, leveraging statistical learning methods to improve fraud detection accuracy. Key components included experimentation with Bernoulli Naive Bayes and Exact Bayes classifiers, supported by SMOTE for class balancing and Laplace smoothing for probability calibration. A custom dialer system was designed to test and tune classification thresholds in real time, enhancing performance across multiple evaluation metrics.

Key features of the project include:

  • Improved fraud classification by applying Bernoulli and Exact Bayes with class rebalancing via SMOTE.
  • Enabled dynamic performance tuning by adjusting fraud-detection cutoffs based on model confidence.
  • Achieved F1 scores of 0.93 (Bernoulli NB) and 0.96 (Exact Bayes), validating model robustness on imbalanced data.
  • Benchmarked model performance and fine-tuned feature selection to minimize runtime and increase accuracy.
  • Laplace Smoothing used to manage sparse feature space, stabilizing predictions and reducing false positives.

Key Insights

  • Exact Bayes consistently outperformed Bernoulli NB, especially at lower thresholds with SMOTE applied.
  • A mid-range threshold (0.35 to 0.45) yielded the best trade-off between false positives and recall.
  • Optimized features led to a ~20% speed gain while preserving model accuracy.
  • Job title, employment type, and location contributed most to fraud detection across both classifiers.